178 research outputs found

    Ant Colony Algorithm Applied to Automatic speech Recognition Graph Decoding

    Get PDF
    International audienc

    ANT COLONY ALGORITHM APPLIED TO AUTOMATIC SPEECH RECOGNITION GRAPH DECODING

    Get PDF
    International audienceIn this article we propose an original approach that allows the decoding of Automatic Speech Recognition Graphs by using a constructive algorithm based on ant colonies. In classical approaches, when a graph is decoded with higher order language models; the algorithm must expand the graph in order to develop each new observed n-gram. This extension process increases the computation time and memory consumption. We propose to use an ant colony algorithm in order to explore ASR graphs with a new language model, without the necessity of expanding it. We first present results based on the TED English corpus where 2-grams graph are decoded with a 4-grams language model. Then, we show that our approach performs better than a conventional Viterbi algorithm when computing time is constrained and allows a highly threaded decoding process with a single graph and a strict control of computation time and memory consumption

    Reconnaissance d'ordres domotiques en conditions bruitées pour l'assistance à domicile (Recognition of Voice Commands by Multisource ASR and Noise Cancellation in a Smart Home Environment) [in French]

    No full text
    National audienceDans cet article, nous présentons un système de reconnaissance automatique de la parole dédié à la reconnaissance d'ordres domotiques dans le cadre d'un habitat intelligent en conditions réelles et bruitées. Ce système utilise un étage d'annulation de bruit qui est à l'état de l'art. L'évaluation du système proposé est effectuée sur des données audio acquises dans un habitat intelligent où des microphones ont été placés proche des sources de bruit (radio, musique...) ainsi que dans les plafonds des différentes pièces. Ce corpus audio, a été enregistré avec 23 locuteurs prononçant des phrases banales, de détresse ou de type domotique. Les techniques de décodage utilisant des connaissances a priori donnent des résultats en conditions bruitées comparables à ceux obtenus en conditions normales, ce qui permet de les envisager en conditions réelles. Cependant l'étage d'annulation de bruit semble beaucoup plus efficace pour annuler les bruits issus de la radio (parole) que ceux de type musicaux

    Recognition of Voice Commands by Multisource ASR and Noise Cancellation in a Smart Home Environment

    No full text
    International audienceIn this paper, we present a multisource ASR system to detect home automation orders in various everyday listening conditions in a realistic home. The system is based on a state of the art echo cancellation stage that feeds recently introduced ASR techniques. The evaluation was conducted on a realistic noisy data set acquired in a smart home where a microphone was placed near the noise source and several other microphones were placed in different rooms. This distant speech corpus was recorded with 23 speakers uttering colloquial or distress sentences as well as home automation orders. Techniques acting at the decoding stage and using a priori knowledge gave the best results in noisy condition compared to the baseline (recall= 93.2% vs 59.2%) reaching good enough performance for a real usage although improvement still need to be made when music is used as background noise

    Recognition of Voice Commands by Multisource ASR and Noise Cancellation in a Smart Home Environment

    No full text
    International audienceIn this paper, we present a multisource ASR system to detect home automation orders in various everyday listening conditions in a realistic home. The system is based on a state of the art echo cancellation stage that feeds recently introduced ASR techniques. The evaluation was conducted on a realistic noisy data set acquired in a smart home where a microphone was placed near the noise source and several other microphones were placed in different rooms. This distant speech corpus was recorded with 23 speakers uttering colloquial or distress sentences as well as home automation orders. Techniques acting at the decoding stage and using a priori knowledge gave the best results in noisy condition compared to the baseline (recall= 93.2% vs 59.2%) reaching good enough performance for a real usage although improvement still need to be made when music is used as background noise

    Sense Embeddings in Knowledge-Based Word Sense Disambiguation

    Get PDF
    International audienc

    Reconnaissance automatique de la parole guidée par des transcriptions a priori

    Get PDF
    Robustness in speech recognition refers to the need to maintain high recognition accuracies even when the quality of the input speech is degraded. In the last decade, some papers proposed to use relevant meta-data in order to enhance the recognition process. Nevertheless, in many cases, an imperfect a priori transcript can be associated to the speech signal : movie subtitles, scenarios and theatrical plays, summariesand radio broadcast. This thesis addresses the issue of using such imperfect transcripts for improving the performance figures of automatic speech recognition (ASR) systems.Unfortunately, these a priori transcripts seldom correspond to the exact word utterances and suffer from a lack of temporal information. In spite of their varying quality, we will show how to use them to improve ASR systems. In the first part of the document we propose to integrate the imperfect transcripts inside the ASR search algorithm. We propose a method that allows us to drive an automatic speech recognition system by using prompts or subtitles. This driven decoding algorithm relies on an on-demand synchronization and on the linguistic rescoring of ASR hypotheses. In order to handle transcript excerpts, we suggest a method for extracting segments in large corpora. The second part presents the Driven Decoding Algorithm(DDA) approach in combining several speech recognition (ASR) systems : it consists in guiding the search algorithm of a primary ASR system by the one-best hypotheses of auxiliary systems.Our work suggests using auxiliary information directly inside an ASR system. The driven decoding algorithm enhances the baseline system and improves the a priori transcription. Moreover, the new combination schemes based on generalized-DDA significantly outperform state of the art combinations.L’utilisation des systèmes de reconnaissance automatique de la parole nécessite des conditions d’utilisation contraintes pour que ces derniers obtiennent des résultats convenables. Dans de nombreuses situations, des informations auxiliaires aux flux audio sont disponibles. Le travail de cette thèse s’articule autour des approches permettant d’exploiter ces transcriptions a priori disponibles. Ces informations se retrouvent dans de nombreuses situations : les pièces de théâtre avec les scripts des acteurs, les films accompagnés de sous-titres ou de leur scénario, les flashes d’information associés aux prompts des journalistes, les résumés d’émissions radio... Ces informations annexes sont de qualité variable, mais nous montrerons comment ces dernières peuvent être utilisées afin d’améliorer le décodage d’un SRAP.Ce document est divisé en deux axes liés par l’utilisation de transcriptions a priori au sein d’un SRAP : la première partie présente une méthode originale permettant d’exploiter des transcriptions a priori manuelles, et de les intégrer directement au cœur d’un SRAP. Nous proposons une méthode permettant de guider efficacement le système de reconnaissance à l’aide d’informations auxiliaires. Nous étendons notre stratégie à delarges corpus dénués d’informations temporelles. La seconde partie de nos travaux est axée sur la combinaison de SRAP. Nous proposons une combinaison de SRAP basée sur le décodage guidé : les transcriptions a priori guidant un SRAP principal sont fournies par des systèmes auxiliaires.Les travaux présentés proposent d’utiliser efficacement une information auxiliaire au sein d’un SRAP. Le décodage guidé par des transcriptions manuelles permet d’améliorer sensiblement la qualité du décodage ainsi que la qualité de la transcription a priori . Par ailleurs, les stratégies de combinaison proposées sont originales et obtiennent d’excellents résultats par rapport aux méthodes existantes à l’état de l’art

    Reconnaissance automatique de la parole distante dans un habitat intelligent : méthodes multi-sources en conditions réalistes (Distant Speech Recognition in a Smart Home : Comparison of Several Multisource ASRs in Realistic Conditions) [in French]

    No full text
    International audienceLe domaine des maisons intelligentes s'est développé dans le but d'améliorer l'assistance aux personnes en perte d'autonomie. La reconnaissance automatique de la parole (RAP) commence à être utilisée, mais reste en retrait par rapport à d'autres technologies. Nous présentons le projet Sweet-Home ayant pour objectif le contrôle de l'environnement domestique par la voix. Plusieurs approches, état de l'art et nouvelles, sont évaluées sur des données enregistrées en conditions réalistes. Le corpus de parole distante, enregistré auprès de 21 locuteurs simule des scénarios intégrant des activités journalières dans un appartement équipé de plusieurs microphones. Les techniques opérant au cours du décodage et utilisant des connaissances a priori permettent d'obtenir des résultats très intéressants par rapport à un système RAP classique

    Multichannel Automatic Recognition of Voice Command in a Multi-Room Smart Home : an Experiment involving Seniors and Users with Visual Impairment

    No full text
    International audienceVoice command system in multi-room smart homes for assist- ing people in loss of autonomy in their daily activities must face several challenges, one of which being the distant condi- tion which impacts the ASR system performance. This paper presents an approach to improve voice command recognition at the decoding level by using multiple sources and model adap- tation. The method has been tested on data recorded with 11 elderly and visually impaired participants in a real smart home. The results show an error rate of 3.2% in off-line condition and of 13.2% in on-line condition

    Distant Speech Recognition for Home Automation: Preliminary Experimental Results in a Smart Home

    No full text
    International audienceThis paper presents a study that is part of the Sweet-Home project which aims at developing a new home automation system based on voice command. The study focused on two tasks: distant speech recognition and sentence spotting (e.g., recognition of domotic orders). Regarding the first task, different combinations of ASR systems, language and acoustic models were tested. Fusion of ASR outputs by consensus and with a triggered language model (using a priori knowledge) were investigated. For the sentence spotting task, an algorithm based on distance evaluation between the current ASR hypotheses and the predefine set of keyword patterns was introduced in order to retrieve the correct sentences in spite of the ASR errors. The techniques were assessed on real daily living data collected in a 4-room smart home that was fully equipped with standard tactile commands and with 7 wireless microphones set in the ceiling. Thanks to Driven Decoding Algorithm techniques, a classical ASR system reached 7.9% WER against 35% WER in standard configuration and 15% with MLLR adaptation only. The best keyword pattern classification result obtained in distant speech conditions was 7.5% CER
    • …
    corecore